[CLEAN] Synthetic Benchmark PR #135875 - Allocation: introduce a new decider that balances the index shard count among nodes #19

tomerqodo · 2025-12-04T20:36:00Z

User description

Benchmark PR elastic#135875

Type: Clean (correct implementation)

Original PR Title: Allocation: introduce a new decider that balances the index shard count among nodes
Original PR Description: For a cluster with n data nodes to host an index with m shards, each node ideally should host not significantly more than m / n shards each. This new allocation decider acts on this principle to respond with a NOT_PREFERRED in event of a node being allocated more shards than threshold.

Nodes in shutdown are excluded when computing fair workload for nodes. A load skew tolerance setting is added to permit nodes to own more than ideal number of shards for the index.

Relates ES-12080
Original PR URL: elastic#135875

PR Type

Enhancement

Description

Introduce IndexBalanceAllocationDecider to balance index shard distribution
Add settings to control decider behavior with excess shard tolerance
Register new decider in cluster module allocation pipeline
Extend DiscoveryNodeFilters with hasFilters() utility method
Improve test infrastructure with shared assertion helper

Diagram Walkthrough

flowchart LR
  A["IndexBalanceConstraintSettings<br/>(Configuration)"] --> B["IndexBalanceAllocationDecider<br/>(New Decider)"]
  B --> C["ClusterModule<br/>(Registration)"]
  D["DiscoveryNodeFilters<br/>(hasFilters utility)"] --> B
  E["FilterAllocationDecider<br/>(Exposed constants)"] --> B
  B --> F["Allocation Pipeline<br/>(Shard Balancing)"]

File Walkthrough

Relevant files

Enhancement

6 files

IndexBalanceAllocationDecider.java `New allocation decider for index shard balancing`	+172/-0
ClusterModule.java `Register IndexBalanceAllocationDecider in allocation pipeline`	+2/-0
DiscoveryNodeFilters.java `Add hasFilters utility method to check filter presence`	+4/-0
BalancedShardsAllocator.java `Expose getAllocation and make ProjectIndex record public for testing`	+7/-1
WeightFunction.java `Make calculateNodeWeightWithIndex method public for testing`	+2/-1
FilterAllocationDecider.java `Expose filter prefix constants for reuse by other deciders`	+3/-3

Configuration changes

2 files

IndexBalanceConstraintSettings.java `Settings for index balance decider configuration`	+59/-0
ClusterSettings.java `Register IndexBalanceConstraintSettings in cluster settings`	+3/-0

Tests

4 files

IndexBalanceAllocationDeciderTests.java `Comprehensive test suite for index balance allocation decider`	+365/-0
ClusterModuleTests.java `Update allocation decider order test to include new decider`	+3/-1
BalancedShardsAllocatorTests.java `Update test to use public calculateNodeWeightWithIndex method`	+1/-1
ESAllocationTestCase.java `Add shared assertDecisionMatches helper for allocation tests`	+13/-0

Miscellaneous

1 files

WriteLoadConstraintDeciderTests.java `Remove duplicate assertion helper moved to base test class`	+0/-14

qodo-code-review · 2025-12-04T20:36:41Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No audit logs: The new allocation decision paths add or deny shard placements without emitting audit logs containing actor, timestamp, action, and outcome. Referred Code @Override public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) { if (indexBalanceConstraintSettings.isDeciderEnabled() == false \|\| isStateless == false \|\| hasFilters()) { return allocation.decision(Decision.YES, NAME, "Decider is disabled."); } Index index = shardRouting.index(); if (node.hasIndex(index) == false) { return allocation.decision(Decision.YES, NAME, "Node does not currently host this index."); } assert node.node() != null; assert node.node().getRoles().contains(INDEX_ROLE) \|\| node.node().getRoles().contains(SEARCH_ROLE); if (node.node().getRoles().contains(INDEX_ROLE) && shardRouting.primary() == false) { return allocation.decision(Decision.YES, NAME, "An index node cannot own search shards. Decider inactive."); } if (node.node().getRoles().contains(SEARCH_ROLE) && shardRouting.primary()) { return allocation.decision(Decision.YES, NAME, "A search node cannot own primary shards. Decider inactive."); } ... (clipped 53 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Edge-case handling: The decider assumes non-empty eligible nodes and positive shard counts, returning YES when empty, but lacks explicit guards or metrics for unusual states (e.g., all nodes filtered/shutdown) beyond assertions. Referred Code final ProjectId projectId = allocation.getClusterState().metadata().projectFor(index).id(); final Set<DiscoveryNode> eligibleNodes = new HashSet<>(); int totalShards = 0; String nomenclature = EMPTY; if (node.node().getRoles().contains(INDEX_ROLE)) { collectEligibleNodes(allocation, eligibleNodes, INDEX_ROLE); // Primary shards only. totalShards = allocation.getClusterState().routingTable(projectId).index(index).size(); nomenclature = "index"; } else if (node.node().getRoles().contains(SEARCH_ROLE)) { collectEligibleNodes(allocation, eligibleNodes, SEARCH_ROLE); // Replicas only. final IndexMetadata indexMetadata = allocation.getClusterState().metadata().getProject(projectId).index(index); totalShards = indexMetadata.getNumberOfShards() * indexMetadata.getNumberOfReplicas(); nomenclature = "search"; } assert eligibleNodes.isEmpty() == false; if (eligibleNodes.isEmpty()) { return allocation.decision(Decision.YES, NAME, "There are no eligible nodes available."); ... (clipped 4 lines) Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-12-04T20:38:10Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category Suggestion Impact

High-level

Decider should respect allocation filters

The IndexBalanceAllocationDecider currently deactivates when cluster-wide
allocation filters are used. It should instead incorporate these filters to
correctly determine the set of eligible nodes for balancing calculations.

Examples:

server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/IndexBalanceAllocationDecider.java [74-76]

        if (indexBalanceConstraintSettings.isDeciderEnabled() == false || isStateless == false || hasFilters()) {
            return allocation.decision(Decision.YES, NAME, "Decider is disabled.");
        }

server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/IndexBalanceAllocationDecider.java [147-153]

    private void collectEligibleNodes(RoutingAllocation allocation, Set<DiscoveryNode> eligibleNodes, DiscoveryNodeRole role) {
        for (DiscoveryNode discoveryNode : allocation.nodes()) {
            if (discoveryNode.getRoles().contains(role) && allocation.metadata().nodeShutdowns().contains(discoveryNode.getId()) == false) {
                eligibleNodes.add(discoveryNode);
            }
        }
    }

Solution Walkthrough:

Before:

// In IndexBalanceAllocationDecider.java

public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
    if (hasFilters()) {
        // Decider deactivates itself if any cluster-wide filters are present.
        return allocation.decision(Decision.YES, NAME, "Decider is disabled.");
    }
    // ...
    Set<DiscoveryNode> eligibleNodes = new HashSet<>();
    collectEligibleNodes(allocation, eligibleNodes, role); // Considers all nodes of a certain role.
    // ...
}

private void collectEligibleNodes(RoutingAllocation allocation, Set<DiscoveryNode> eligibleNodes, DiscoveryNodeRole role) {
    for (DiscoveryNode discoveryNode : allocation.nodes()) {
        if (discoveryNode.getRoles().contains(role) && /* not shutting down */) {
            eligibleNodes.add(discoveryNode);
        }
    }
}

After:

// In IndexBalanceAllocationDecider.java

public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
    // The check for hasFilters() is removed.
    // ...
    Set<DiscoveryNode> eligibleNodes = new HashSet<>();
    collectEligibleNodes(allocation, eligibleNodes, role); // Now respects filters.
    // ...
}

private void collectEligibleNodes(RoutingAllocation allocation, Set<DiscoveryNode> eligibleNodes, DiscoveryNodeRole role) {
    for (DiscoveryNode discoveryNode : allocation.nodes()) {
        if (discoveryNode.getRoles().contains(role) && /* not shutting down */
            && matchesFilters(discoveryNode)) { // Apply cluster-wide filters
            eligibleNodes.add(discoveryNode);
        }
    }
}

private boolean matchesFilters(DiscoveryNode node) {
    // Logic to check node against clusterRequireFilters,
    // clusterIncludeFilters, and clusterExcludeFilters.
    // Similar to FilterAllocationDecider logic.
}

Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a major design limitation that severely restricts the new decider's applicability in common real-world scenarios, such as tiered architectures, and proposes a crucial enhancement to make the feature broadly useful.

High

More

Apply changes for benchmark PR

8807d64

qodo-code-review bot added the Review effort 2/5 label Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CLEAN] Synthetic Benchmark PR #135875 - Allocation: introduce a new decider that balances the index shard count among nodes #19

[CLEAN] Synthetic Benchmark PR #135875 - Allocation: introduce a new decider that balances the index shard count among nodes #19

Uh oh!

tomerqodo commented Dec 4, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Dec 4, 2025

Uh oh!

qodo-code-review bot commented Dec 4, 2025

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CLEAN] Synthetic Benchmark PR #135875 - Allocation: introduce a new decider that balances the index shard count among nodes #19

Are you sure you want to change the base?

[CLEAN] Synthetic Benchmark PR #135875 - Allocation: introduce a new decider that balances the index shard count among nodes #19

Uh oh!

Conversation

tomerqodo commented Dec 4, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Benchmark PR elastic#135875

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Dec 4, 2025

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Dec 4, 2025

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomerqodo commented Dec 4, 2025 •

edited by qodo-code-review bot

Loading